A Consistent and Efficient Estimator for the Data-oriented Parsing Model

نویسنده

  • Andreas Zollmann
چکیده

Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One desired property of an estimator is that its guess approaches the unknown distribution as the sample sequence grows large. Mathematically speaking, this property is called consistency. This thesis presents the first (non-trivial) consistent estimator for the Data-Oriented Parsing (DOP) model. A consistency proof is given that addresses a gap in the current probabilistic grammar literature and can serve as the basis for consistency proofs for other estimators in statistical parsing. The thesis also expounds the computational and empirical superiority of the new estimator over the common DOP estimator DOP1 : While achieving an exponential reduction in the number of fragments extracted from the treebank (and thus parsing time), the parsing accuracy improves over DOP1. Another formal property of estimators is being biased. This thesis studies that property for the case of DOP and presents the somewhat surprising finding that every unbiased DOP estimator overfits the training data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Consistent and Efficient Estimator for Data-Oriented Parsing

Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One crucial property of a ‘good’ estimator is that its guess approaches the unknown distribution as the sample sequence grows large. This property is called consistency. This paper concerns estimators f...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

A Consistent Estimator for Uniform Parameter Under Interval Censoring

‎The censored data are widely used in statistical tests and parameters estimation‎. ‎In some cases e.g‎. ‎medical accidents which data are not recorded at the time of occurrence‎, ‎some methods such as interval censoring are used‎. ‎In this paper‎, ‎for a random sample uniformly distributed on the interval (0,θ) ‎the interval censoring have been used‎. ‎A consistent estimator of θ  and some asy...

متن کامل

Data-Oriented Parsing

1. A DOP model for phrase-structure trees R. Bod and R. Scha 2. Probability models for DOP R. Bonnema 3. Encoding frequency information in stochastic parsing models 1. Computational complexity of disambiguation under DOP K. Sima'an 2. Parsing DOP with Monte Carlo techniques J. Chappelier and M. Rajman 3. Towards efficient Monte Carlo parsing R. Bonnema 4. Efficient parsing of DOP with PCFG-redu...

متن کامل

Ranking Efficient Decision Making Units Using Cooperative Game Theory Based on SBM Input-Oriented Model and Nucleolus Value

In evaluating the efficiency of decision making units (DMUs) by Data Envelopment Analysis (DEA) models, may be more than one DMU has an efficiency score equal to one. Since ranking of efficient DMUs is essential for decision makers, therefore, methods and models for this purpose are presented. One of ranking methods of efficient DMUs is cooperative game theory. In this study, Lee and Lozano mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005